Keyword-based document clustering

نویسنده

  • Seung-Shik Kang
چکیده

Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Log based Keyword Extraction and Spread based Clustering for an Efficient Information Searching

Today an efficient information search is very important to extract and analyze user requirements in vast amount of web information. Due to this reason, this paper proposes the log based keyword extraction method which finds the associated keywords in a certain domain. Also, this paper proposes the spread based clustering method as clustering the keywords with high association among the keyword-...

متن کامل

Simultaneous Categorization of Text Documents and Identification of Cluster-dependent Keywords

In this paper, we propose a new approach to unsupervised text document categorization based on a coupled process of clustering and cluster-dependent keyword weighting. The proposed algorithm is based on the K-Means clustering algorithm. Hence it is computationally and implementationally simple. Moreover, it learns a different set of keyword weights for each cluster. This means that, as a by-pro...

متن کامل

Content Based Document Image Retrieval with Support Vectors Clustering

The goal of this paper is representing a suitable approach to content based document image retrieval. in proposed algorithm a feature vector is extracted with wavelet transform for sub-words. then based on this features, sub-words are clustered with support vector clustering (SVC) algorithm, then this approach is used for searching based on keyword in content based document retrieval problem. T...

متن کامل

Experiments in Clustering Documents for Automatic Acquisition of Lexical Semantic Networks for Polish

The aim of this work is to explore document clustering techniques for the needs of semi–automatic construction of a lexical semantic network for Polish. Although the majority of research in this area is based on measures of distributional similarity calculated from co-occurrences of words in large collections of documents, we wanted to approach a difficult problem of meaning ambiguity resolutio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003